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Abstract . This article presents several alternatives to Pearson’s correlation coefficient 
and many examples. In the samples where the rank in a discrete variable counts more 
than the variable values, the mixture of Pearson’s and Spearman’s gives a better result. 

Introduction 



Let’s consider a bivariate sample, which consists of n > 2 pairs (x,y). We denote these 
pairs by: 

(xi.yi), (x 2 , y 2), , (x n ,y n ), 

where xi = the value of x for the i - th observation, 
and y 1 = the value of y for the t-th observation, 
for any 1 < i < n. 

We can construct a scatter plot in order to detect any relationship between variables x and 
y, drawing a horizontal x-axis and a vertical y-axis, and plotting points of coordinates 
(x*, yd for all i e {1,2, ...,n}. 

We use the standard statistics notations, mostly used in regression analysis: 

Yj X= lL Xi ’ (•&?«•)» 

1=1 i= 1 i= 1 

2> 2 =5> 2 , X'- : 2> 2 , (d 

i = 1 /=! 



X = — = the mean of sample variable x, 

n 

n 

z> 

Y = — — = the mean of sample variable y. 
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Let’s introduce a notation for the median: 




X M = the median of sample variable x, 
Y M = the median of sample variable y. 



( 2 ) 



Correlation Coefficients. 



Correlation coefficient of variables x and y shows how strongly the values of these 
variables are related to one another. It is denoted by r and re [-1, 1], 

If the correlation coefficient is positive, then both variables are simultaneously increasing 
(or simultaneously decreasing). 

If the correlation coefficient is negative, then when one variable increases while the other 
decreases, and reciprocally. 

Therefore, the correlation coefficient measures the degree of line association between two 
variables. 

We have strong relationship if re [0.8, 1] or re [-1, -0.8]; 

moderate relationship if re (0.5, 0.8) or re (-0.8, -0.5); (3) 

And weak relationship if re [-0.5, 0.5]. 

Correlation coefficient does not depend on the measurement unit, neither on the order of 
variables: (x, y) or (y, x). 

If r = 1 or -1, then there is a perfectly linear relationship between x and y. If r = 0, or 
close to zero, then there is not a strong linear relationship, but there might be a strong 
non-linear relationship that can be checked on the scatter plot. 

The coefficient of determination , denoted by r , represents the proportion of variation in ^ 
due to a linear relationship between x and y in the sample: 

2 SSTo - SS Re sid , SSResid 

r = = 1 (4) 

SSTo SSTo 

n 

where SSTo = total sum of squares = ^(y-y) 2 = ^(y< -y) 2 (5) 

1=1 

n 

and SSResid = residual sum of squares = ^(y - y) 2 = ^(y/ - yi) (6) 

1=1 

with yi = the i-th predicted value = a + bx* for i e { 1 ,2, . . . ,n} 

resulting from substituting each sample x value into the equation for the least-squares line 




y = a + bx 



where b = 



J>-2-[(J>) A 2/n] 



and a = Y -b X . 

Obviously: coefficient of detennination = (correlation coefficient) . 
Two sample correlation coefficients are well-known: 

1) Pearson’s sample correlation coefficient, let’s denote it by r p 



~ yv ”3 

vx- a 2-[(X-) a 2 / - ] -vx^ a 2-[(i:^) a 2//,] 



( 7 ) 

( 8 ) 



( 9 ) 



which is the most popular; 

and 2) Spearman’s rank correlation coefficient , let’s denote it by rs, which is obtained 
from the previous one by replacing, for each ie { 1,2, . . ., n}, xi by its rank in the variable 
x, and similarly for y i. 



* 

We propose more alternative sample correlation coefficients in the following 

ways, replacing in Pearson’s formula (9): 

3.1. Each xj by its deviation from the x mean: x* - x , 
and each y i by its deviation from the y mean: y- r y . 

3.2. Each X* by its deviation from the x minimum: x r x m i n , and each y ,• by its deviation 
from the y minimum: y t -Vmin. 

3.3. Each xi by its deviation from the x maximum: x max - x £ , and each y ; by its deviation 
from the y maximum: y ma x-yi 

3 .4. Each x* by its deviation from a given x k (for k e { 1 , 2, . . . , n}): 



Xj-X k 

and each y ( - by its deviation from the corresponding given yn: 

yi-yk 




Not surprisingly, all these four alternative sample correlation coefficients are equal to 
Pearson’s since they are simply related to translations of Cartesian axes, whose origin 
(0,0) is moved to (x, y ) , (x min , y mi „), (x max , y max ), or (x k , y k ) respectively. 



Example: Let the variables x, y be given below: 



X 


6 


7 


12 


14 


23 


41 


53 


60 


69 


72 


y 


2.5 


1.1 


6.3 


2.1 


2.9 


15.3 


20.7 


18.4 


22 


33 



Table 1 



and their scatter plot: 
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Graph 1 



1) Calculating Pearson’s correlation coefficient: 
= 357; x = 35.7; 

2> = 124.3; y =12.43; 

^.r 2 = 18,989; 

2> 2 = 2, 634.11; 

^ xy =6,916.8; 



75 x 




r p = 0.95075. 



2) Calculating Spearman’s rank correlation coefficient: 



X 


l 


2 


3 


4 


5 


6 


7 


8 


9 


10 


y 


3 


1 


5 


2 


4 


6 


8 


7 


9 


10 



Table 2 



Zx= a±ipo =n5=55 . 

Zy = 55 '- 

JV = 385; 

^ v 2 = 385; 

2> =377; 
r s = 0.90303. 



3.1) Replacing x ; by x,- - x and y« by y,- - y for all i (deviations from the mean): 



X 


-29.7 


-28.7 


-23.7 


-21.7 


-12.7 


5.3 


17.3 


24.3 


33.3 


36.3 


y 


-9.93 


-11.33 


-6.13 


-10.33 


-9.53 


2.87 


8.27 


5.97 


9.57 


20.57 



Table 3 



Similarly: Vr = 0, 

__ 10 

because = ^(x;-x) = xi -x + X 2 —x + ... +xio —x = (xi + x? + ... + xio) -lOx 

1=1 



— (Xi + X2 + ... + Xio) — 10* 



XI + X2+ ... + Xn 
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= 0 ; 



2 > =°; 

JV = 6,244. 10; 

£y 2 = 1,089.06; 




=2,479.29; 
r m ean = 0.95075. 



3.2) Replacing x;, y, by their deviations from the smaller x: = x-x sma n andy: = y-y sm aii 
we have a translation of axes again. 



X 


0 


1 


6 


8 


17 


35 


47 


54 


63 


66 


y 


1.4 


0 


5.2 


1 


1.8 


14.2 


19.6 


17.3 


20.9 


31.9 



Table 4 

I>= 297 ; 

S-V =113-3; 

= 15,065; 
JV = 2,372.75; 
Z * y =5,844.30; 
r ( smaii) = 0.95075. 



3.3) Replacing x h y,- by their deviations from the maximum: 



X 


66 


65 


60 


58 


49 


31 


19 


12 


3 


0 


y 


30.5 


31.9 


26.7 


30.9 


30.1 


17.7 


12.3 


14.6 


11 


0 



Table 5 

Z* =363; 

Zv =205.7; 

Z* 2 = 19,421; 

Z-f 2 = 5,320.31; 

Z xy =9,946.20; 

Rmax) = 0.95075. 

3.4) Replacing xi by x/ - X4 and y i by y t - y4 (in this case k = 4), (X4, y4) = (14, 2.1): 



X 


-8 


-7 


-2 


0 


9 


27 


39 


46 


55 


58 


y 


0.4 


-1 


4.2 


0 


0.8 


13.2 


18.6 


16.3 


19.9 


30.9 



Table 6 

Z-*= 217 ; 

Z^ = 103.3; 




JV= 10,953; 
J> 2 = 2, 156.15; 
2> =4,720.9; 



r 4 = if = 0.95075 for any ie {1, 2, ..., 10}. 



Similarly if we replace in Pearson’s formula (9) and also getting the same result equals to 

r p : 

3.5) Each x* by its deviation from x’s median, and each y* by its deviation from y’s 
median. 

3.6) Each x* by its deviation from x’s standard deviation, and each y* by its deviation 
from y’s standard deviation. 

3.7) Each x; by x; ± a (where a is any number), and each y; by y;± b (where b is any 
number). 

3.8) Each x^ by x* * a (where a is any non-zero number and is either division or 
multiplication), and each y t by y t * b (similarly for b and “*”). 

Since the cases 3.5 - 3.7 are similar to 3.1 - 3.4, let’s consider two examples for the case 

3.8: 

3.8.1) Suppose each x; in the original example, Table 1, is divided by 5, while each y, is 
divided by 2. 

Then: ^x=71.4; 

2> =62.15; 

^V = 759.56; 

^ y 2 = 658.528; 

£xy =691.68; 

^division, division) 0.95075. 

3.8.2) Now, let’s still divide each x; in Table 1 by 5, but this time multiply each y; with 

2 . 



Then: ^x=71.4; 
2> =248.6; 




= 759.56; 

£ y 2 = 10,536.4; 
y^xy =2,766.72; 

^division, multiplication) 0.95075. 

So, again these results coincide with Pearson’s. 

More interesting alternative correlation coefficients [and given different results from 
Pearson’s and Spearman’s] are obtained by doing: 



A mixture of Pearson’s and Spearman’s correlation coefficients. 

4. 1 We only replace x ( - by its rank among x’s, while y * remains unchanged: 



x rank 


l 


2 


3 


4 


5 


6 


7 


8 


9 


10 


y 


2.5 


1.1 


6.3 


2.1 


2.9 


15.3 


20.7 


18.4 


22 


33 



Table 7 

Yj x = 55 > 

2 > =124.3; 

= 385; 

J> 2 = 2, 634.11; 

2> =958.4; 

r SjP = 0.91661 e [0.90303, 0.95075]. 

4.2. Similarly, as above, let’s only replace y* by its rank among y’s, while x; remains 
unchanged. 



X 


6 


7 


12 


14 


23 


41 


53 


60 


69 


72 


y rank 


3 


1 


5 


2 


4 


6 


8 


7 


9 


10 



Table 8 

=357; 

I.v=55; 

X^ 2 = 18,989; 

E/ = 385; 

Yj x y =2,636; 

r p , s = 0.93698 e [0.90303,0.95075]. 




Both mixture correlation coefficients give different results from Pearson’s and 
Speannan’s, actually they are in between. 

Conclusion : 

In the samples where the rank in a discrete variable counts more than the variable values, 
this mixture of correlation coefficients brings better results than Pearson’s or Spearman’s. 
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